Model Selection

End-to-End Training

# End-to-End Training

Coco Instance Eomt Large 1280

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.

Image Segmentation

Ade20k Panoptic Eomt Giant 1280

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, revealing ViT's potential in image segmentation tasks.

Image Segmentation

Ade20k Panoptic Eomt Large 1280

This paper proposes an image segmentation model based on Vision Transformer (ViT), revealing the potential of ViT in image segmentation tasks.

Image Segmentation

Coco Panoptic Eomt Large 1280

This paper proposes a novel perspective by treating Vision Transformer (ViT) as an image segmentation model and explores its potential in image segmentation tasks.

Image Segmentation

Ade20k Semantic Eomt Large 512

This model is developed based on the paper 'Your ViT is Actually an Image Segmentation Model' and is a Vision Transformer model for image segmentation tasks.

Image Segmentation

Coco Panoptic Eomt Large 640

This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture for segmentation purposes.

Image Segmentation

Coco Instance Eomt Large 640

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.

Image Segmentation

Coco Panoptic Eomt Giant 1280

By rethinking the architecture of Vision Transformer (ViT), this model demonstrates its potential in image segmentation tasks.

Image Segmentation

Detr Finetuned Chess

This is an object detection model based on the DETR architecture, specifically fine-tuned for chess piece recognition tasks.

Object Detection

YOLOv10x is the latest version of the YOLO series, focusing on real-time end-to-end object detection, offering higher detection accuracy and faster inference speed.

Object Detection

YOLOv10 is a real-time end-to-end object detection model developed by the Tsinghua University team, based on the latest improved version of the YOLO series.

Object Detection

YOLOv10 is a real-time end-to-end object detection model developed by the Tsinghua University team, representing the latest improvement in the YOLO series.

Object Detection

YOLOv10 is a real-time end-to-end object detection model proposed by Tsinghua University, known for its efficiency and accuracy.

Object Detection

Control V11p Sd15 Inpaint

ControlNet v1.1 is a neural network architecture based on diffusion models, designed to control image generation through additional conditions, particularly suited for image inpainting tasks.

Image Generation Other

Mamba 3B Slimpj

A 3B-parameter language model based on the Mamba architecture, supporting English text generation tasks.

Large Language Model

Transformers English

Detr Resnet 50 Finetuned Cppe5

DETR object detection model fine-tuned on an image folder dataset, based on facebook/detr-resnet-50

Object Detection

Timesformer Bert Video Captioning

A video caption generation model based on Timesformer and BERT architectures, capable of generating descriptive captions for video content.

EnCodec is a real-time high-fidelity neural audio codec developed by Meta AI, supporting multiple bandwidth configurations and streaming processing.

Audio Generation

Invoice information extraction model fine-tuned based on Donut architecture, enabling OCR-free document understanding

Detr Resnet 50 Finetuned OCR

An OCR model fine-tuned from facebook/detr-resnet-50 for object detection tasks

Text Recognition

Deformable Detr Box Supervised

Deformable DETR is an object detection model based on Transformer architecture, trained on the LVIS dataset, supporting detection of 1203 object categories.

Object Detection

Re2g Qry Encoder Fever

Re2G is a generative model combining neural initial retrieval and reranking for knowledge-intensive tasks. This question encoder is a component of the Re2G system, used to encode questions into vectors for retrieval.

Re2g Qry Encoder Nq

Re2G is an end-to-end system combining neural retrieval, reranking, and generation for knowledge-intensive tasks. This model serves as its Natural Questions (NQ) question encoder component.

Question Answering System

Cifar 10 Vgg Pretrained

Image classification model implemented with PyTorch, capable of recognizing multiple common object categories

Image Classification

Wav2vec2 Base Timit Demo Colab

A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, for demonstration purposes

Speech Recognition

Gunnarthor Talromur A Fastspeech2

A FastSpeech2 text-to-speech model trained on the ESPnet framework and talromur dataset, supporting Icelandic speech synthesis.

Speech Synthesis English

Vilt B32 Finetuned Vqa

ViLT is a vision-and-language transformer model fine-tuned on the VQAv2 dataset for visual question answering tasks.

Wav2vec2 Gpt2 Wandb Grid Search

Automatic Speech Recognition (ASR) model trained on the LibriSpeech dataset

Speech Recognition

Wav2vec2 Large Xlsr Arabic Common Voice 10 Epochs

Arabic speech recognition model based on wav2vec2 architecture, trained for 10 epochs on the Common Voice dataset

Speech Recognition

A hybrid summarization generation model based on hierarchical reinforcement learning, combining the advantages of extractive and abstractive summarization to enhance information richness and readability

Text Generation

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase